Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor assembly directory fix in load-spark-env-sh #5261

Closed
wants to merge 1 commit into from
Closed

Minor assembly directory fix in load-spark-env-sh #5261

wants to merge 1 commit into from

Conversation

dstripelis
Copy link
Contributor

Set the current dir path $FWDIR and same at $ASSEMBLY_DIR1, $ASSEMBLY_DIR2
otherwise $SPARK_HOME cannot be visible from spark-env.sh -- no SPARK_HOME variable is assigned there.
I am using the Spark-1.3.0 source code package and I come across with this when trying to start the master: sbin/start-master.sh

Set the current dir path $FWDIR and same at $ASSEMBLY_DIR1, $ASSEMBLY_DIR2
otherwise $SPARK_HOME cannot be visible from spark-env.sh -- no SPARK_HOME variable is assigned there.
I am using the Spark-1.3.0 source code package and I come across with this when trying to start the master: sbin/start-master.sh
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Mar 30, 2015

(You could give this a more specific title than "Update load-spark-env.sh")
This is borderline important enough for a JIRA, but I think we might consider this a minor add-on fix for SPARK-4924, maybe.

I'm not sure about this. For example spark-class sources this script with . "$SPARK_HOME"/bin/load-spark-env.sh and pyspark does similarly. So these have SPARK_HOME set.

However run-example uses . "$FWDIR"/bin/load-spark-env.sh, and scripts in sbin use . "$SPARK_PREFIX/bin/load-spark-env.sh" Clearly they don't expect SPARK_HOME necessarily.

CC @vanzin since this used to refer to FWDIR actually:
517975d

The lines you reference don't exist in 1.3.0 though. Are you sure you're using 1.3.0?

@vanzin
Copy link
Contributor

vanzin commented Mar 30, 2015

Hmm. I see. spark-daemon.sh calls this script directly, but does not set SPARK_HOME. Makes sense.

One comment: doing the following in a script that is sourced is a little weird:

FWDIR="$(cd "`dirname "$0"`"/..; pwd)"

That's because $0 actually refers to the script that is sourcing this one, not load-spark-env.sh. In Spark, though, the code above probably still works because of where all the scripts are.

I'd put a comment in the code about that, maybe, or use $BASH_SOURCE instead. Other than that, looks like a correct change to me.

@dstripelis
Copy link
Contributor Author

To be more precise, I am running Spark on HPC and I have made the build on scala-2.11; however when I started the master, the error I was receiving was
echo "Failed to find Spark assembly in $assembly_folder" 1>&2
echo "You need to build Spark before running this program." 1>&2
which were coming from the compute-classpath.sh script
So, I figured that the scala version was not passed correctly, since when we are doing the sourcing
. "$FWDIR"/bin/load-spark-env.sh inside the the compute-classpath.sh, the scala_version was assigned explicitly to 2.10, clearly it could not find the assembly of scala-2.11 in (assembly/target/scala-2.11). After some closer investigation I saw that when the check of if a scala-2.11 directory exists (in the load-spark-env.sh) the $FWD_DIR did not have a valid value and it was mistakenly giving a scala version of 2.10 since it couldn't find the correct path; as such I suggested this change.

and yes this is on Spark 1.3.0, I have downloaded the package from the following mirror:
http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.0/spark-1.3.0.tgz

@srowen
Copy link
Member

srowen commented Apr 2, 2015

@Raschild what about Marcelo's comment about $BASH_SOURCE? does this affect master or only 1.3 and before?

@dstripelis
Copy link
Contributor Author

@srowen yes you could use something like
FWDIR="$(cd "dirname "${BASH_SOURCE[0]}""/..; pwd)"

@vanzin yes you are right; I only used because it is the same first command in the compute-classpath.sh because I thought it will be wise to have some command cohesion...

From what I see it is the same issue in Spark-1.2.0 package...I haven't look at previous versions!

@srowen
Copy link
Member

srowen commented Apr 3, 2015

Yes I was wonder whether it affects master rather than older versions, but it looks like it does. Up to you which approach you would like to take in the end but I'm happy to merge your final decision shortly.

@srowen
Copy link
Member

srowen commented Apr 3, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29665 has started for PR 5261 at commit b9babcd.

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29665 has finished for PR 5261 at commit b9babcd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29665/
Test FAILed.

@srowen
Copy link
Member

srowen commented Apr 3, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29668 has started for PR 5261 at commit b9babcd.

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29668 has finished for PR 5261 at commit b9babcd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29668/
Test PASSed.

@srowen
Copy link
Member

srowen commented Apr 5, 2015

@Raschild this is waiting on your comment and/or change. Can you make the title more descriptive please?

@dstripelis dstripelis changed the title Update load-spark-env.sh Spark-Master-Standalone-Cluster-Initialization Apr 5, 2015
@dstripelis
Copy link
Contributor Author

@srowen
Changed the title. Hope this one is more indicative!

@srowen
Copy link
Member

srowen commented Apr 6, 2015

@Raschild ... hm I would have thought something like [MINOR] fix assembly dir location in load-spark-env-sh is more descriptive. Marcelo still has an outstanding question for you about BASH_SOURCE. What do you think? sounds like it should either have a comment or change.

I'm still a little confused about which branch this applies to. Yes, I see you're applying this to master, fine. However in 1.3, the two lines you change don't reference SPARK_HOME. I assume the net change in previous branches is just to set FWDIR differently?

I'd like to get this resolved one way or the other.

@dstripelis
Copy link
Contributor Author

@srowen - Title Updated
I applied Marcello's comment but still it couldn't work. The whole point for $FWDIR is in these two lines:
ASSEMBLY_DIR2="$FWDIR/assembly/target/scala-2.11"
ASSEMBLY_DIR1="$FWDIR/assembly/target/scala-2.10"
if you do not set up properly the directory then the if condition will fall into the scala-2.10 and this is when the error occurs:
if [ -d "$ASSEMBLY_DIR2" ]; then export SPARK_SCALA_VERSION="2.11" else export SPARK_SCALA_VERSION="2.10" fi
Overall from what I have seen this applies to most of the spark development packages. And yes as you have stated before the net change in previous branches is just to set FWDIR differently! Otherwise, another way to work around it, could be to source a script with all the global variables [ i.e. $SPARK_HOME. ]

@dstripelis dstripelis changed the title Spark-Master-Standalone-Cluster-Initialization Minor assembly directory fix in load-spark-env-sh Apr 8, 2015
@asfgit asfgit closed this in 53f6bb1 Apr 9, 2015
@srowen
Copy link
Member

srowen commented Apr 9, 2015

I've merged this for master; I will come back and cherry-pick into 1.3 / 1.2 after the 1.3 release, just as I don't want to merge into the 1.3 branch with any possible functional change this second.

asfgit pushed a commit that referenced this pull request Apr 15, 2015
Set the current dir path $FWDIR and same at $ASSEMBLY_DIR1, $ASSEMBLY_DIR2
otherwise $SPARK_HOME cannot be visible from spark-env.sh -- no SPARK_HOME variable is assigned there.
I am using the Spark-1.3.0 source code package and I come across with this when trying to start the master: sbin/start-master.sh

Author: raschild <raschild@users.noreply.github.com>

Closes #5261 from raschild/patch-1 and squashes the following commits:

b9babcd [raschild] Update load-spark-env.sh
asfgit pushed a commit that referenced this pull request Apr 15, 2015
Set the current dir path $FWDIR and same at $ASSEMBLY_DIR1, $ASSEMBLY_DIR2
otherwise $SPARK_HOME cannot be visible from spark-env.sh -- no SPARK_HOME variable is assigned there.
I am using the Spark-1.3.0 source code package and I come across with this when trying to start the master: sbin/start-master.sh

Author: raschild <raschild@users.noreply.github.com>

Closes #5261 from raschild/patch-1 and squashes the following commits:

b9babcd [raschild] Update load-spark-env.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants